44 research outputs found
Recurrent Human Pose Estimation
We propose a novel ConvNet model for predicting 2D human body poses in an
image. The model regresses a heatmap representation for each body keypoint, and
is able to learn and represent both the part appearances and the context of the
part configuration. We make the following three contributions: (i) an
architecture combining a feed forward module with a recurrent module, where the
recurrent module can be run iteratively to improve the performance, (ii) the
model can be trained end-to-end and from scratch, with auxiliary losses
incorporated to improve performance, (iii) we investigate whether keypoint
visibility can also be predicted. The model is evaluated on two benchmark
datasets. The result is a simple architecture that achieves performance on par
with the state of the art, but without the complexity of a graphical model
stage (or layers).Comment: FG 2017, More Info and Demo:
http://www.robots.ox.ac.uk/~vgg/software/keypoint_detection
Gradient-based Uncertainty for Monocular Depth Estimation
In monocular depth estimation, disturbances in the image context, like moving
objects or reflecting materials, can easily lead to erroneous predictions. For
that reason, uncertainty estimates for each pixel are necessary, in particular
for safety-critical applications such as automated driving. We propose a post
hoc uncertainty estimation approach for an already trained and thus fixed depth
estimation model, represented by a deep neural network. The uncertainty is
estimated with the gradients which are extracted with an auxiliary loss
function. To avoid relying on ground-truth information for the loss definition,
we present an auxiliary loss function based on the correspondence of the depth
prediction for an image and its horizontally flipped counterpart. Our approach
achieves state-of-the-art uncertainty estimation results on the KITTI and NYU
Depth V2 benchmarks without the need to retrain the neural network. Models and
code are publicly available at https://github.com/jhornauer/GrUMoDepth.Comment: Accepted to ECCV 202
Automated Automotive Radar Calibration With Intelligent Vehicles
While automotive radar sensors are widely adopted and have been used for
automatic cruise control and collision avoidance tasks, their application
outside of vehicles is still limited. As they have the ability to resolve
multiple targets in 3D space, radars can also be used for improving environment
perception. This application, however, requires a precise calibration, which is
usually a time-consuming and labor-intensive task. We, therefore, present an
approach for automated and geo-referenced extrinsic calibration of automotive
radar sensors that is based on a novel hypothesis filtering scheme. Our method
does not require external modifications of a vehicle and instead uses the
location data obtained from automated vehicles. This location data is then
combined with filtered sensor data to create calibration hypotheses. Subsequent
filtering and optimization recovers the correct calibration. Our evaluation on
data from a real testing site shows that our method can correctly calibrate
infrastructure sensors in an automated manner, thus enabling cooperative
driving scenarios.Comment: 5 pages, 4 figures, accepted for presentation at the 31st European
Signal Processing Conference (EUSIPCO), September 4 - September 8, 2023,
Helsinki, Finlan
Pedestrian Environment Model for Automated Driving
Besides interacting correctly with other vehicles, automated vehicles should
also be able to react in a safe manner to vulnerable road users like
pedestrians or cyclists. For a safe interaction between pedestrians and
automated vehicles, the vehicle must be able to interpret the pedestrian's
behavior. Common environment models do not contain information like body poses
used to understand the pedestrian's intent. In this work, we propose an
environment model that includes the position of the pedestrians as well as
their pose information. We only use images from a monocular camera and the
vehicle's localization data as input to our pedestrian environment model. We
extract the skeletal information with a neural network human pose estimator
from the image. Furthermore, we track the skeletons with a simple tracking
algorithm based on the Hungarian algorithm and an ego-motion compensation. To
obtain the 3D information of the position, we aggregate the data from
consecutive frames in conjunction with the vehicle position. We demonstrate our
pedestrian environment model on data generated with the CARLA simulator and the
nuScenes dataset. Overall, we reach a relative position error of around 16% on
both datasets.Comment: Accepted for presentation at the 26th IEEE International Conference
on Intelligent Transportation Systems (ITSC 2023), 24-28 September 2023,
Bilbao, Bizkaia, Spai
Point Transformer
In this work, we present Point Transformer, a deep neural network that
operates directly on unordered and unstructured point sets. We design Point
Transformer to extract local and global features and relate both
representations by introducing the local-global attention mechanism, which aims
to capture spatial point relations and shape information. For that purpose, we
propose SortNet, as part of the Point Transformer, which induces input
permutation invariance by selecting points based on a learned score. The output
of Point Transformer is a sorted and permutation invariant feature list that
can directly be incorporated into common computer vision applications. We
evaluate our approach on standard classification and part segmentation
benchmarks to demonstrate competitive results compared to the prior work. Code
is publicly available at: https://github.com/engelnico/point-transforme
Localizing Spatial Information in Neural Spatiospectral Filters
Beamforming for multichannel speech enhancement relies on the estimation of
spatial characteristics of the acoustic scene. In its simplest form, the
delay-and-sum beamformer (DSB) introduces a time delay to all channels to align
the desired signal components for constructive superposition. Recent
investigations of neural spatiospectral filtering revealed that these filters
can be characterized by a beampattern similar to one of traditional
beamformers, which shows that artificial neural networks can learn and
explicitly represent spatial structure. Using the Complex-valued Spatial
Autoencoder (COSPA) as an exemplary neural spatiospectral filter for
multichannel speech enhancement, we investigate where and how such networks
represent spatial information. We show via clustering that for COSPA the
spatial information is represented by the features generated by a gated
recurrent unit (GRU) layer that has access to all channels simultaneously and
that these features are not source -- but only direction of arrival-dependent.Comment: Submitted to the 31st European Signal Processing Conference (EUSIPCO
2023), Helsinki, Finland. 5 pages, 3 figure
Data-Free Backbone Fine-Tuning for Pruned Neural Networks
Model compression techniques reduce the computational load and memory
consumption of deep neural networks. After the compression operation, e.g.
parameter pruning, the model is normally fine-tuned on the original training
dataset to recover from the performance drop caused by compression. However,
the training data is not always available due to privacy issues or other
factors. In this work, we present a data-free fine-tuning approach for pruning
the backbone of deep neural networks. In particular, the pruned network
backbone is trained with synthetically generated images, and our proposed
intermediate supervision to mimic the unpruned backbone's output feature map.
Afterwards, the pruned backbone can be combined with the original network head
to make predictions. We generate synthetic images by back-propagating gradients
to noise images while relying on L1-pruning for the backbone pruning. In our
experiments, we show that our approach is task-independent due to pruning only
the backbone. By evaluating our approach on 2D human pose estimation, object
detection, and image classification, we demonstrate promising performance
compared to the unpruned model. Our code is available at
https://github.com/holzbock/dfbf.Comment: Accpeted for presentation at the 31st European Signal Processing
Conference (EUSIPCO) 2023, September 4-8, 2023, Helsinki, Finlan